AITopics | keyboard and mouse

Collaborating Authors

keyboard and mouse

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Li, Muyao, Wang, Zihao, He, Kaichen, Ma, Xiaojian, Liang, Yitao

arXiv.org Artificial IntelligenceMar-20-2025

Recently, action-based decision-making in open-world environments has gained significant attention. Visual Language Action (VLA) models, pretrained on large-scale web datasets, have shown promise in decision-making tasks. However, previous work has primarily focused on action post-training, often neglecting enhancements to the foundational model itself. In response, we introduce a novel approach, Act from Visual Language Post-Training, which refines Visual Language Models (VLMs) through visual and linguistic guidance in a self-supervised manner. This enhancement improves the models' capabilities in world knowledge, visual recognition, and spatial grounding in open-world environments. Following the above post-training paradigms, we obtain the first VLA models in Minecraft that can follow human instructions on over 1k different atomic tasks, including crafting, smelting, cooking, mining, and killing. Our experiments demonstrate that post-training on non-trajectory tasks leads to a significant 40% improvement over the best agent baseline on a diverse set of atomic tasks. Furthermore, we demonstrate that our approach surpasses traditional imitation learning-based policies in Minecraft, achieving state-of-the-art performance. We have open-sourced the code, models, and datasets to foster further research. The project page can be found in https://craftjarvis.github.io/JarvisVLA.

machine learning, natural language, post-training large-scale vision language model, (4 more...)

arXiv.org Artificial Intelligence

2503.16365

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Natural Language (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Cradle: Empowering Foundation Agents Towards General Computer Control

Tan, Weihao, Zhang, Wentao, Xu, Xinrun, Xia, Haochong, Ding, Ziluo, Li, Boyu, Zhou, Bohan, Yue, Junpeng, Jiang, Jiechuan, Li, Yewen, An, Ruyi, Qin, Molei, Zong, Chuqiao, Zheng, Longtao, Wu, Yujie, Chai, Xiaoqiang, Bi, Yifei, Xie, Tianbao, Gu, Pengjie, Li, Xiyun, Zhang, Ceyao, Tian, Long, Wang, Chaojie, Wang, Xinrun, Karlsson, Börje F., An, Bo, Yan, Shuicheng, Lu, Zongqing

arXiv.org Artificial IntelligenceJul-2-2024

Despite the success in specific scenarios, existing foundation agents still struggle to generalize across various virtual scenarios, mainly due to the dramatically different encapsulations of environments with manually designed observation and action spaces. To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i.e., using screenshots as input and keyboard and mouse actions as output. We introduce Cradle, a modular and flexible LMM-powered framework, as a preliminary attempt towards GCC. Enhanced by six key modules, Cradle can understand input screenshots and output executable code for low-level keyboard and mouse control after high-level planning, so that Cradle can interact with any software and complete long-horizon complex tasks without relying on any built-in APIs. Experimental results show that Cradle exhibits remarkable generalizability and impressive performance across four previously unexplored commercial video games, five software applications, and a comprehensive benchmark, OSWorld. Cradle is the first to enable foundation agents to follow the main storyline and complete 40-minute-long real missions in the complex AAA game Red Dead Redemption 2 (RDR2). Cradle can also create a city of a thousand people in Cities: Skylines, farm and harvest parsnips in Stardew Valley, and trade and bargain with a maximal weekly total profit of 87% in Dealer's Life 2. Cradle can not only operate daily software, like Chrome, Outlook, and Feishu, but also edit images and videos using Meitu and CapCut. Cradle greatly extends the reach of foundation agents by enabling the easy conversion of any software, especially complex games, into benchmarks to evaluate agents' various abilities and facilitate further data collection, thus paving the way for generalist agents.

action and sequential screenshot, please analyze and describe, screenshot and corresponding description, (14 more...)

arXiv.org Artificial Intelligence

2403.03186

Country:

North America > United States > Tennessee (0.04)
Asia > China > Hong Kong (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
(3 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.47)
Personal > Interview (0.45)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Software (1.00)
Energy > Renewable (0.92)
Energy > Power Industry (0.92)

Technology:

Information Technology > Software (1.00)
Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(9 more...)

Add feedback

Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization

Esencan, Mert, Kumar, Tarun Advaith, Asanjan, Ata Akbari, Lott, P. Aaron, Mohseni, Masoud, Unlu, Can, Venturelli, Davide, Ho, Alan

arXiv.org Artificial IntelligenceJun-19-2024

Recent Large Language Models (LLMs) have demonstrated impressive capabilities at tasks that require human intelligence and are a significant step towards human-like artificial intelligence (AI). Yet the performance of LLMs at reasoning tasks have been subpar and the reasoning capability of LLMs is a matter of significant debate. While it has been shown that the choice of the prompting technique to the LLM can alter its performance on a multitude of tasks, including reasoning, the best performing techniques require human-made prompts with the knowledge of the tasks at hand. We introduce a framework for what we call Combinatorial Reasoning (CR), a fully-automated prompting method, where reasons are sampled from an LLM pipeline and mapped into a Quadratic Unconstrained Binary Optimization (QUBO) problem. The framework investigates whether QUBO solutions can be profitably used to select a useful subset of the reasons to construct a Chain-of-Thought style prompt. We explore the acceleration of CR with specialized solvers. We also investigate the performance of simpler zero-shot strategies such as linear majority rule or random selection of reasons. Our preliminary study indicates that coupling a combinatorial solver to generative AI pipelines is an interesting avenue for AI reasoning and elucidates design principles for future CR methods.

keyboard and mouse, optimization, reasoning, (14 more...)

arXiv.org Artificial Intelligence

2407.00071

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

How to Play Your Favorite Google Play Mobile Games on PC (2023)

WIREDOct-24-2023, 14:00:00 GMT

Hopefully, Google will add more titles soon. Any developers interested in making their Android games compatible can get started at the official Android developer's website. The Google Play Games for PC service works with Google Play Points, so you can earn points for purchases (including subscriptions and in-app purchases) just as you would on an Android device. Any points you accumulate can be redeemed for vouchers and special game offers in the Play Store. Once you start a game, you can press Shift and Tab to access the menu, where you can change the screen resolution, tweak the volume, and remap the game controls.

favorite google play mobile game, google play game, keyboard and mouse, (1 more...)

WIRED

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Games > Computer Games (0.67)

Add feedback

Effective Gesture Based Framework for Capturing User Input

Charan, Pabbathi Sri, Gupta, Saksham, Agrawal, Satvik, Sindhu, Gadupudi Sahithi

arXiv.org Artificial IntelligenceAug-1-2022

Computers today aren't just confined to laptops and desktops. Mobile gadgets like mobile phones and laptops also make use of it. However, one input device that hasn't changed in the last 50 years is the QWERTY keyboard. Users of virtual keyboards can type on any surface as if it were a keyboard thanks to sensor technology and artificial intelligence. In this research, we use the idea of image processing to create an application for seeing a computer keyboard using a novel framework which can detect hand gestures with precise accuracy while also being sustainable and financially viable. A camera is used to capture keyboard images and finger movements which subsequently acts as a virtual keyboard. In addition, a visible virtual mouse that accepts finger coordinates as input is also described in this study. This system has a direct benefit of reducing peripheral cost, reducing electronics waste generated due to external devices and providing accessibility to people who cannot use the traditional keyboard and mouse.

international conference, keyboard, virtual keyboard, (17 more...)

arXiv.org Artificial Intelligence

2208.00913

Country:

Asia > India > Chandigarh (0.05)
Asia > India > Telangana > Hyderabad (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.54)

Add feedback

Back-to-school shopping: How to buy the right computer for students of any age

USATODAY - Tech Top StoriesAug-16-2021, 21:36:34 GMT

A new school season is upon us – cue the rolling eyes, students – and so you might be in the market for a new computer. Whether you're back in the classroom or continuing to learn online, it's simply the most important piece of tech to help you stay on your game. Problem is, how do you decide what kind of computer is for you? Not only are there varying prices, competing operating systems and countless brands to choose from, but the student – or the parent(s) footing the bill – must decide on an ideal form factor (or type of computer), such as a laptop, desktop, 2-in-1 or all-in-one. And you might think you need a degree in computer science just to understand today's specifications ("specs").

chromebook, computer, laptop, (16 more...)

USATODAY - Tech Top Stories

Industry: Education > Educational Setting > Online (0.30)

Technology:

Information Technology > Software (0.74)
Information Technology > Artificial Intelligence (0.72)
Information Technology > Hardware (0.51)

Add feedback

Google's DeepMind AI takes on StarCraft II

#artificialintelligenceNov-25-2016, 17:35:03 GMT

At BlizzCon earlier this month in Anaheim, California, Blizzard announced an ambitious new project in collaboration with DeepMind, a leading artificial intelligence research company acquired by Google in 2014. After creating the AlphaGo AI that bested the world's top Go player earlier this year, DeepMind's next groundbreaking challenge will be StarCraft II. If DeepMind is able to build an AI that could learn how to beat top players such as Byun "ByuN" Hyun Woo in the complex real-time strategy, tactics and resource management of this game, it would be a giant step forward in AI research. And with DeepMind's interest in using its research to solve hard problems in areas such as healthcare and energy efficiency on a massive scale, this Starcraft II project could impact the whole world. Soon after AlphaGo's Go victory, there were signs that DeepMind would take on StarCraft next. This was not lost on legendary StarCraft player/commentator and former competitive chess player Dan "Artosis" Stemkosi, for whom StarCraft seemed like the logical next step for AI research after games like chess and Go.

large language model, machine learning, starcraft, (19 more...)

#artificialintelligence

Country: North America > United States > California > Orange County > Anaheim (0.25)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Developing a Language for Spoken Programming

Gordon, Benjamin M. (University of New Mexico)

AAAI ConferencesAug-4-2011

The dominant paradigm for programming a computer today is text entry via keyboard and mouse, but there aremany common situations where this is not ideal. I address this through the creation of a new language thatis explicitly intended for spoken programming. In addition, I describe a supporting editor that improvesrecognition accuracy by making use of type information and scoping to increase recognizer context.

programmer, programming language, syntax, (16 more...)

AAAI Conferences

Sixteenth AAAI/SIGART Doctoral Consortium

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.05)

Technology:

Information Technology > Software Engineering (0.71)
Information Technology > Human Computer Interaction (0.70)
Information Technology > Artificial Intelligence > Natural Language (0.48)

Add feedback